Skip to content

fix: don't mutate shared yaml.Node when rendering schemas concurrently#594

Merged
daveshanley merged 2 commits into
pb33f:mainfrom
gnuletik:fix/concurrent-render-shared-node-mutation
Jun 24, 2026
Merged

fix: don't mutate shared yaml.Node when rendering schemas concurrently#594
daveshanley merged 2 commits into
pb33f:mainfrom
gnuletik:fix/concurrent-render-shared-node-mutation

Conversation

@gnuletik

@gnuletik gnuletik commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Fixes daveshanley/vacuum#912

Summary

NodeBuilder.AddYAMLNode calls (*yaml.Node).Encode(value) where value is frequently a *yaml.Node owned by the model. Encode runs the value through the desolve stage, which rewrites Tag/Style in place, and the representer returns input *yaml.Nodes as-is (nodev returns the node unchanged). So rendering a schema mutates the shared node it was built from.

When schemas are rendered concurrently — e.g. a linter running rules in parallel — that mutation races with any goroutine reading the same node. A common symptom: a string scalar's tag is briefly elided to "" mid-render, so a concurrent reader checking node.Tag == "!!str" sees the wrong type and reports a bogus const value type does not match schema type error.

This fixes it by encoding a deep copy whenever the value is a *yaml.Node, so the desolve mutation lands on a throwaway copy and the model's nodes stay immutable.

When this regressed

The shared-node read path is not new — the trigger is a change in go.yaml.in/yaml/v4's (*Node).Encode, pulled in when libopenapi went from v0.37.x to v0.38.0.

  • go.yaml.in/yaml/v4 rc.4 (used up to libopenapi v0.37.2): Encode marshalled the value to YAML bytes and re-parsed them into fresh nodes. It never touched the caller's node. There is no desolver.go in rc.4.
  • go.yaml.in/yaml/v4 rc.5 (pulled in by libopenapi v0.38.0): Encode was rewritten into a RepresentDesolveSerialize pipeline. The new Desolve stage elides inferable tags (n.Tag = "") and normalizes quote style on the represented graph, and Represent/nodev returns input *Node values as-is (aliased). So Encode now mutates caller-owned nodes.

Because libopenapi hands shared model nodes to Encode during inline schema rendering, that new in-place mutation began landing on shared nodes and racing with concurrent readers. This is why the downstream symptom (vacuum #912) appears exactly at vacuum v0.29.3 / libopenapi v0.38.0 and not before.

Downstream confirmation in vacuum (which renders schemas concurrently across rules): GOMAXPROCS=1 makes the false error disappear, vacuum v0.29.2 (libopenapi v0.37.2 / yaml rc.4) never reproduces it, and v0.29.3+ does.

Changes

  • datamodel/high/node_builder.go: add encodeSafeValue (deep-copies *yaml.Node values) and deepCopyYAMLNode; route both rawNode.Encode(value) call sites through encodeSafeValue.
  • datamodel/high/base/schema_proxy_test.go: add TestSchemaProxy_ConcurrentRender_DoesNotMutateSharedNodes, which renders a schema concurrently while reading a captured const scalar's tag and asserts the tag is never mutated.
  • datamodel/high/node_builder_test.go: add TestEncodeSafeValue_DeepCopiesYAMLNodes, a unit test covering the passthrough, nil, alias and content branches of the new helpers.

Reproduction

The race detector flags it on main:

WARNING: DATA RACE
Write by goroutine N:
  go.yaml.in/yaml/v4 ...Desolver.desolveScalar   desolver.go:98   (writes n.Tag/n.Style)
  go.yaml.in/yaml/v4 ...(*Node).Encode           node.go
  libopenapi ...(*NodeBuilder).AddYAMLNode       node_builder.go:648
  libopenapi ...(*Schema).MarshalYAMLInlineWithContext
Previous read by goroutine M:
  ...reads the same node's Tag

Verification

check result
new test, go test -race passes with fix; data race + FAIL without it
go test ./datamodel/high/... pass
go test ./renderer/... ./bundler/... pass

Notes

  • deepCopyYAMLNode adds one allocation per encoded *yaml.Node; negligible next to the represent/desolve/serialize/recompose round-trip Encode already performs, and only on the *yaml.Node path.
  • There is also an argument that (*yaml.Node).Encode in go.yaml.in/yaml/v4 should not mutate caller-owned nodes at all — rc.4 did not. That would fix every caller and restore the previous contract. This PR keeps the change inside libopenapi so it doesn't depend on an upstream yaml release.

🤖 Generated with Claude Code

@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.75%. Comparing base (0fc7fa0) to head (8972039).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #594   +/-   ##
=======================================
  Coverage   99.75%   99.75%           
=======================================
  Files         280      280           
  Lines       34157    34172   +15     
=======================================
+ Hits        34073    34088   +15     
  Misses         51       51           
  Partials       33       33           
Flag Coverage Δ
unittests 99.75% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Adds a unit test exercising the passthrough, nil, alias and content
branches of the new node-copy helpers, bringing patch coverage to 100%.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@daveshanley daveshanley left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@daveshanley daveshanley merged commit a3dd56b into pb33f:main Jun 24, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

oas-schema-check intermittently reports a false const type error (data race, since v0.29.3)

2 participants